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Abstract— Today's Internet often suffers transient outages, but as increasingly critical services mlgraie to 
the cloud; much higher levels of Internet availability will be necessary. The stunning shift to^aroVcloud 
computing has created new pressures on the Internet. Loads are soaring, and mani«I$/lcations 
increasingly depend on real-time data streaming. Unfortunately, the reliability of I nterne*S$r streaming 
leaves much to be desired. Here, we focus on routing in the Internet's core, at extrefcael^igh data rates 
(all-to-all data rates of 40 Gbits per second are common today, with DO Gbits/s witok^gfit). These kinds 
of routers are typically implemented as clusters of computers and line cards^ii^JrTect a data center 
dedicated to network routing. The architecture is such that individual cojrw^ents can fail without 
bringing the whole operation to a halt. . For example, network links are redifr^pit; if one link fails, there 
will usually be a backup. Such a router could even run routing protoroi^of different type's side by-side, 
making the actual routing decisions by consensus — if some pro^?^\jaltance malfunctions, its peers 
would simply outvote it. But suppose that a routing protocol (for cl^l^we focus on the Border Gateway 
Protocol [BGP], implemented by a BGP daemon [BGPD] hosted ^reg/he node within the router) needs to 
be restarted after a crash or updated with a software patch or mif^fed within the cluster. 

Keywords: Cloud Computing, Network, Routing, dustersj^Wer Gateway Protocol. 




Cloud computing, particularly in conjunctionpwth increased device mobility, is reshaping the Internet. 
We're seeing unprecedented shifts in dem^\patterns, a broad spectrum of new quality expectations, 
and a realignment of the entire field's e/'oVKiics. The implications are far-reaching. 

The main text of this article f^use\?h high availability, one of several key properties today's cloud 
computing applications deman^Wje need is most obvious in voice-over-IP (VoIP) telephony and video 
disruptions can cause connartms'to seize up or fail in ways streaming: for such uses, even the briefest 
that are highly visible to ]>KJC«r user. 

If we can crack thej^^tvailability barrier," we can imagine a future in which the Internet carries all such 
traffic. Many cl«^^nputing uses are so important (both in the terms of their scale and the associated 
revenue strearas^aTunless the Internet can evolve to meet the demands, the associated cloud computing 
enterprises Mw^lpr consider building new networks that would be dedicated to their use. For example, 
network -Uj&Ware redundant; if one link fails there will usually be a backup. Such a router could even run 
routi n/pltomcols of different types side -by-side, making the actual routing decisions by consensus- if some 
l^to^tfHristance malfunctions, its peers would simply outvote it. But suppose that a routing protocol (for 
cla\t^f we focus on the Border Gateway Protocol [BGP], implemented by a BGP daemon [BGPD] hosted on 
some node within the router) needs to be restarted after a crash or updated with a software patch or 
migrated within the cluster. 

A Close Look at BGP 

BGP is a very robust and scalable routing protocol, as evidenced by the fact that BGP is the routing protocol 
employed on the Internet. At the time of this writing, the Internet BGP routing tables number more than 
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90,000 routes. To achieve scalability at this level, BGP uses many route parameters, called attributes, to 
define routing policies and maintain a stable routing environment. 

In addition to BGP attributes, classless inter domain routing (CIDR) isused by BGP to reduce the size of the 
Internet routing tables. For example, assume that an ISP owns the IP address block ]95.D.x.x from the 
traditional Class C address space. This block consists of 256 Class C address blocks, 195.D.0.X through 
B5.D.255.X. Assume that the ISP assigns a Class C block to each of its customers. Without CIDR, the ISP 
would advertise 256 Class C address blocks to its BGP peers. With CIDR, BGP can supernet the address 
space and advertise one block, B5.D.x.x. This block is the same size as a traditional Class B address Wflick. 
The class distinctions are rendered obsolete by CIDR, allowing a significant reduction in the BGP •fc^ing 
tables. BGP neighbors exchange full routing information when the TCP connection between rarahljors is 
first established. When changes to the routing table are detected, the BGP routers send to tlw^ighbors 
only those routes that have changed. BGP routers do not send periodic routing updates, M&QfcP routing 
updates advertise only the optimal path to a destination network. t+ > 



A BGP router can communicate with other BGP routers in its own AS or in oflj^ASs. Both the l-BGP and 
E-BGP implement the BGP protocol with a few different rules. All l-BG£^Bpeal*ing routers within the same 
AS, must peer with each other in a fully connected mesh. They arei|&ut|jlired to be physical neighbors, 
just to keep a TCP connection as a reliable transport mechanisra^^ause there is no loop detection 
mechanism in l-BGP, all l-BGP-speaking routers must not forwaX^ty 3rd-party routing information to 
their peers. In contrast, E-BGP routers are able to advertise 3rd^m.y information to their E-BGP peers, by 
default. Figure 1 shows routers R\ R2, and R3 using l-BGP toVtcljange routing information within the same 



BGP is designed for use in networks composed of interconnected autonomous systems (ASs). An AS could 
be a network operated by some ISP, or might be a campus or corporate rate network. BGP maintains a table 
of IP networks, or "prefixes," that represent paths to a particular AS or set of ASs, tracking both direct 
neighbors and more remote ones. A BGPD instance runs on a router and uses path availability, network 
policies, or operator-defined databases of routing rules to select preferred routes. 
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II. Internal and External BGP 





Figure II nternal BGP (l-BGP) versus external BGP (E-BGP). 




^change routing information between Ass. 
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III. Operation 

BGP neighbors, called peers, are established by anual configuration between routers to create aTCP session 
on port 179. A BGP speaker sends B-byte keep-alive messages every 30 seconds to maintain the 
connection. 131 Among routing protocols, BGP is unique in using TCP as its transport protocol. When BGP 
runs between two peers in the same autonomous system (AS), it is referred to as Internal 
BGP (iBGP or Interior Border Gateway Protocol). When it runs between different autonomous systems, it is 
called External BGP (EBGP or Exterior Border Gateway Protocol). Routers on the boundary of one AS 
exchanging information with another AS are called border or edge routers or simply eBGP peers^ag^Ve 
typically connected directly, while iBGP peers can be interconnected through other intermediate r^^rs. 
Other deployment topologies are also possible, such as running eBGP peering inside a VPN tunn^all^wing 
two remote sites to exchange routing information in a secure and isolated manner. The raajp^ffference 
between iBGP and eBGP peering is in the way routes that were received from one peer ajfe^ft^Dagated to 
other peers. For instance, new routes learned from an eBGP peer are typically redi&ibuSed to all other 
iBGP peers as well as all eBGP peers (if transit mode is enabled on the router). Howea^if fiew routes were 
learned on an iBGP peering, then they are re-advertised only to all other eBCx/eers. These route- 
propagation rules effectively require that all iBGP peersinsidean AS are inter com^flfed in afull mesh. 



Filtering routes learned from peers, their transformation before redistrhbution^to peers or before plumbing 
them into the routing table is typically controlled via route-maps i^tftaBm. These are basically rules 
which allow to apply certain actions to routes matching certain cp^i^Dn either ingress or egress path. 
These rules can specify that the route is to be dropped or, alternative])* its attributes are to be modified. It 
is usually the responsibility of the AS administrator to provide\j^ desired route-map configuration on a 
router supporting BGP. Finite-state machines In order to maj^e decisions in its operations with peers, a BGP 
peer uses a simplefinite state machine (FSM) that consjat^l^six states: Idle; Connect; Active; OpenSent; 
OpenConfirm; and Established. For each peer-to-peetorasion, a BGP implementation maintains a state 
variable that tracks which of these six states the si 




Figure 2. BGP state machine 



The BGP defines the messages that each peer should exchange in order to change the session from one state 
to another. The first state is the "Idle" state. In the "Idle" state, BGP initializes all resources, refuses all 
inbound BGP connection attempts and initiates a TCP connection to the peer. The second state is 
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"Connect". In the "Connect" state, the router waits for the TCP connection to complete and transitions to 
the "Open Sent" state if successful. If unsuccessful, it starts the Connect Retry timer and transitions to the 
"Active" state upon expiration. In the "Active" state, the router resets the Connect Retry timer to zero and 
returns to the "Connect" state. In the "Open Sent" state, the router sends an Open message and waits for 
one in return in order to transition to the "Open Confirm" state. Keep alive messages are exchanged and, 
upon successful receipt, the router is placed into the "Established" state. In the "Established" state, the 
router can send/receive: Keep alive; Update; and Notification messages to/ from its peer. 



farT from its 



Idle State 

Refuse all incoming BGP connections 
Start the initialization of event triggers. 

Initiates a TCP connection with its configured BGP peer. Listens for a TCP coni 
peer. , \ 

Changes its state to Connect. ^2^.* 
If an error occurs at any state of the FSM process, the BGP session isterminateS^ 
Immediately and returned to the Idle state. Some of the reasons why cQOm does not progress 
from the Idle state are: 

TCP port 179 is not open. C/ 
A random TCP port over D23 is not open. * 
Peer address configured incorrectly on either router. >^v*>^ 
As number configured incorrectly on either router. Ol^ 

Connect State v 

Waits for successful TCP negotiation with peer. 

BGP does not spend much time in this state iHtfc^CP session has been successfully established. 
Sends Open message to peer and changes S^f™ Open Sent. 
If an error occurs, BGP moves to theActrfc^Ste. Some reasons for the error are: 
TCP port 179 is not open. 
A random TCP port over D23 is m 
Peer address configured incorn 
AS number configured incc ' 




If the router was unaM^fcrestablish a successful TCP session, then it ends up in the Active state. 
Repeated failure^^/esult in a router cycling between the Idle and Active states. Some of the 
reasons for th> 
TCP port U°l 




Open Sent State 

BGP FSM listens for an Open message from its peer. 

0 nee the message has been received, the router checks the validity of the 0 pen message. 

If there is an error it is because one of the fields in the Open message does not match between the 

peers, e.g., BGP version mismatch, M D5 password mismatch, the peering router expects a different 

My AS, etc. The router then sends a Notification message to the peer indicating why the error 

occurred. 
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If there is no error, a Keepalive message is sent, various timers are set and the state is changed to 
Open Confirm. 

Open Confirm State 

The peer is listening for a Keep alive message from its peer. 

If a Keep alive message is received and no timer has expired before reception of the Keep alive, BGP 
transitions to the Established state. 

If a timer expires before a Keep alive message is received, or if an error condition occurs, tr^e^Tr^r 
transitions back to the I die state. 

In this state, the peers send Update messages to exchange information abort e^fch route being 
advertised to the BGP peer. fS^ 
If there is any error in the Update message then a Notification message is^ofcro the peer, and BGP 
transitions back to the I die state. faj^ 
If a timer expires before a Keep alive message is received, or if an error^Tdition occurs, the router 
transitions back to the I die state. ~ 



Established State 




2^C} Figure 3. Growth on the I nternet 




Figure 4. N umber of AS on the I nternet vs. number of registered AS. 
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One of the largest problems faced by BGP, and indeed the Internet infrastructure as a whole, is the growth 
of the Internet routing table. If the global routing table grows to the point where some older, less capable, 
routers cannot cope with the memory requirements or the CPU load of maintaining the table, these routers 
will cease to be effective gateways between the parts of the I nternet they connect. I n addition, and perhaps 
even more importantly, larger routing tables take longer to stabilize (see above) after a major connectivity 
change, leaving network service unreliable, or even unavailable, in the interim. 



Until late 2001 the global routing table was growing exponentially, threatening an eventual widespread 
breakdown of connectivity. In an attempt to prevent this, ISPs cooperated in keeping the global^ro/ftag 
table as small as possible, by using Classless Inter-Domain Routing (CIDR) and route aggregation^S^iTe 
this slowed the growth of the routing table to a linear process for several years, with the expancteid^mand 
for multi homing by end user networks the growth was once again super linear by the middley^^. A full 
I Pv4 BGP table as of September 20T2 is in excess of 430,000 prefixes. ^' 



Route summarization is often used to improve aggregation of the BGP global j 
reducing the necessary table size in routers of an AS. Consider ASlhas been allocaj 
of 172.16.0.0/ B, this would be counted as one route in the table, but due to cusj 
engineering purposes, AS1 wants to announce smaller, more specific routes! 
and 172.16.128.0/ 18. The prefix 172.16.192.0/ 18 does not have any hosts soA^t} 
route 172.16.192.0/ 18. This all counts as ASlannouncing four routes. ^^V^} 

AS2 will see the 4 routes from AS1( 172.B.0.0/ 16, 172.E.0.0/B, 172jj[§j0/]8 and 172.16.128.0/ 18) and it is up 
to the routing policy of AS2 to decide whether or not to take arojjy of the four routes or, as 172.16. 0.0/15 
overlaps all the other specific routes, to just store the summary, IJOj.O.O/ 16. 



y§ table, thereby 
_ B big address space 
^requirement or traffic 
Z16.0.0/18, 172.16.64.0/18 
s not announce a specific 



If AS2 wants to send data to prefix 172.16. 192.0/ 18, it wil 
At ASTs router, it will either be dropped or a d< 
depending on the configuration of ASls routers. 




it to the routers of ASlon route 172.16.0.0/16. 
Ti unreachable ICMP message will be sent back, 



If AS1 later decides to drop the route 172. 
AS1 will drop the number of routes it ar 
the routing policy of AS2, it will sti 
172.16.64.0/18 to 172.16.0.0/17, t^en 
and 172.16.128.0/ 18. 



If AS2 wants to send dat 
message will be sent back 
the routing table. %\> 




►16, leaving 172.16.0.0/18, 172.16.64.0/18 and 172. 16.128.0/ 18, 
ies to three. AS2 will see the three routes, and depending on 
of the three routes, or aggregate the prefix's 172.16.0.0/ 18 and 
lucing the number of routes AS2 stores to only two: 172.16.0.0/17 



t(^efix 172.16.192.0/18, it will be dropped or a destination unreachable ICMP 
the routers of AS2 (not ASlas before), because 172.16.192.0/ 18 would not be in 
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IV .Future Work 



used poisoning as a measurement tool to uncover hidden network topology and to assess 
:e of default routes. While inspired by this work, ours differs in that we propose using 
perationally as a means to i mprove I nternet avai labi I ity. 

Ongoing work seeks to verify the origin of BGP . By allowing an AS to poison only prefixes it originates, our 
approach is consistent with that goal. Proposals to verify the entire path are also consistent with our general 
approach, if we consider the poison as a (validated) hint from the origin AS to the rest of the network that a 
particular AS is not correctly routing its traffic. By the time such proposals are deployed, it should be 
feasible to develop new routing primitives or standardized BGP. 
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V. Conclusion 



Cloud computing is the most network centric compute paradigm to date. A successful transition to cloud 
will depend on a rock solid network foundation Today's cloud computing systems are appealing for their 
low cost of ownership, amazing scalability, and flexibility. The cloud even brings environmental benefits: 
users share computing resources, which are used more efficiently, and the data centers are typically located 
near power generating sources: by using the net generating sources: by using the network routing 
instabi lities make the cloud less reliable than it needs to be. 
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